Deforestation and CO2 Emission#
import plotly.io as pio
pio.renderers.default = "vscode+jupyterlab+notebook_connected"
Deforestation and CO2 Emission#
Deforestation has been increasing around the world, driven largely by industrialization, and it is playing a significant role in contributing to climate change. This issue has caught my attention, and I am interested in studying how deforestation impacts CO2 emissions. By understanding this connection, we can better grasp how deforestation acts as a driver of climate change.
Research Focus
Research Question:
How does deforestation correlate with CO2 emissions globally from 2001 to 2019?Hypothesis:
Higher rates of deforestation contribute to increased CO2 emissions, with significant regional differences.
Data Sources
Forest Dataset:
Sourced from Global Forest Watch, an organization dedicated to providing real-time data and tools for monitoring forests worldwide.CO2 Emission Dataset:
Obtained from Kaggle, which offers a dataset on CO2 emissions, growth, and population by country.
Step 1 - Import Library and Data Management#
import pandas as pd
import plotly.express as px
forest = pd.read_csv("treecover_loss__ha_1.csv")
forest
| iso | umd_tree_cover_loss__year | umd_tree_cover_loss__ha | |
|---|---|---|---|
| 0 | AFG | 2001 | 88.092712 |
| 1 | AGO | 2001 | 101220.621525 |
| 2 | AIA | 2001 | 3.878461 |
| 3 | ALA | 2001 | 396.934826 |
| 4 | ALB | 2001 | 3729.021031 |
| ... | ... | ... | ... |
| 4566 | XKO | 2023 | 1465.438575 |
| 4567 | XNC | 2023 | 41.029104 |
| 4568 | ZAF | 2023 | 29571.219239 |
| 4569 | ZMB | 2023 | 190416.586825 |
| 4570 | ZWE | 2023 | 5690.371581 |
4571 rows × 3 columns
forest.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4571 entries, 0 to 4570
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 iso 4571 non-null object
1 umd_tree_cover_loss__year 4571 non-null int64
2 umd_tree_cover_loss__ha 4571 non-null float64
dtypes: float64(1), int64(1), object(1)
memory usage: 107.3+ KB
Step 2 - Data Preprocessing:#
The forest dataset from Global Forest Watch includes a column named
iso, which contains country codes following the International Organization for Standardization (ISO) standard. However, it does not include country names or additional specifications.To address this, we need to identify and load an additional dataset containing ISO codes alongside country names and specifications. This will allow us to link the forest data to corresponding countries for further analysis.
Based on the output of the
info()function, we can confirm that the data types for each column are appropriately formatted for analysis.To improve readability, we need to rename the column
umd_tree_cover_loss__yeartoyear.
forest.rename(columns={
'umd_tree_cover_loss__year': 'year',
}, inplace=True)
forest
| iso | year | umd_tree_cover_loss__ha | |
|---|---|---|---|
| 0 | AFG | 2001 | 88.092712 |
| 1 | AGO | 2001 | 101220.621525 |
| 2 | AIA | 2001 | 3.878461 |
| 3 | ALA | 2001 | 396.934826 |
| 4 | ALB | 2001 | 3729.021031 |
| ... | ... | ... | ... |
| 4566 | XKO | 2023 | 1465.438575 |
| 4567 | XNC | 2023 | 41.029104 |
| 4568 | ZAF | 2023 | 29571.219239 |
| 4569 | ZMB | 2023 | 190416.586825 |
| 4570 | ZWE | 2023 | 5690.371581 |
4571 rows × 3 columns
Step 3 - Load the Continents Data#
continents = pd.read_csv('continents2.csv')
continents
| name | alpha-2 | alpha-3 | country-code | iso_3166-2 | region | sub-region | intermediate-region | region-code | sub-region-code | intermediate-region-code | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Afghanistan | AF | AFG | 4 | ISO 3166-2:AF | Asia | Southern Asia | NaN | 142.0 | 34.0 | NaN |
| 1 | Åland Islands | AX | ALA | 248 | ISO 3166-2:AX | Europe | Northern Europe | NaN | 150.0 | 154.0 | NaN |
| 2 | Albania | AL | ALB | 8 | ISO 3166-2:AL | Europe | Southern Europe | NaN | 150.0 | 39.0 | NaN |
| 3 | Algeria | DZ | DZA | 12 | ISO 3166-2:DZ | Africa | Northern Africa | NaN | 2.0 | 15.0 | NaN |
| 4 | American Samoa | AS | ASM | 16 | ISO 3166-2:AS | Oceania | Polynesia | NaN | 9.0 | 61.0 | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 244 | Wallis and Futuna | WF | WLF | 876 | ISO 3166-2:WF | Oceania | Polynesia | NaN | 9.0 | 61.0 | NaN |
| 245 | Western Sahara | EH | ESH | 732 | ISO 3166-2:EH | Africa | Northern Africa | NaN | 2.0 | 15.0 | NaN |
| 246 | Yemen | YE | YEM | 887 | ISO 3166-2:YE | Asia | Western Asia | NaN | 142.0 | 145.0 | NaN |
| 247 | Zambia | ZM | ZMB | 894 | ISO 3166-2:ZM | Africa | Sub-Saharan Africa | Eastern Africa | 2.0 | 202.0 | 14.0 |
| 248 | Zimbabwe | ZW | ZWE | 716 | ISO 3166-2:ZW | Africa | Sub-Saharan Africa | Eastern Africa | 2.0 | 202.0 | 14.0 |
249 rows × 11 columns
After loading the continent data, we can drop unnecessary columns and retain only the
country(stored asname),region, andsub-regioninformation based on theisocodes. Additionally, we need to rename thenamecolumn tocountryto ensure readability.
continents_region = continents[['alpha-3','name','region','sub-region']]
continents_region = continents[['alpha-3', 'name', 'region', 'sub-region']].copy()
continents_region.rename(columns={'region': 'continent_region',
'name': 'country',
'sub-region': 'continent_sub_region'}, inplace=True)
continents_region
| alpha-3 | country | continent_region | continent_sub_region | |
|---|---|---|---|---|
| 0 | AFG | Afghanistan | Asia | Southern Asia |
| 1 | ALA | Åland Islands | Europe | Northern Europe |
| 2 | ALB | Albania | Europe | Southern Europe |
| 3 | DZA | Algeria | Africa | Northern Africa |
| 4 | ASM | American Samoa | Oceania | Polynesia |
| ... | ... | ... | ... | ... |
| 244 | WLF | Wallis and Futuna | Oceania | Polynesia |
| 245 | ESH | Western Sahara | Africa | Northern Africa |
| 246 | YEM | Yemen | Asia | Western Asia |
| 247 | ZMB | Zambia | Africa | Sub-Saharan Africa |
| 248 | ZWE | Zimbabwe | Africa | Sub-Saharan Africa |
249 rows × 4 columns
Step 4 - Data Merging and Further Cleaning#
We are now ready to merge the deforestation dataset with the continent data to include detailed regional information.
forest_region = pd.merge(
forest,
continents_region,
left_on='iso',
right_on='alpha-3',
how='left'
)
forest_region.drop(columns=['alpha-3'], inplace=True)
forest_region
| iso | year | umd_tree_cover_loss__ha | country | continent_region | continent_sub_region | |
|---|---|---|---|---|---|---|
| 0 | AFG | 2001 | 88.092712 | Afghanistan | Asia | Southern Asia |
| 1 | AGO | 2001 | 101220.621525 | Angola | Africa | Sub-Saharan Africa |
| 2 | AIA | 2001 | 3.878461 | Anguilla | Americas | Latin America and the Caribbean |
| 3 | ALA | 2001 | 396.934826 | Åland Islands | Europe | Northern Europe |
| 4 | ALB | 2001 | 3729.021031 | Albania | Europe | Southern Europe |
| ... | ... | ... | ... | ... | ... | ... |
| 4566 | XKO | 2023 | 1465.438575 | NaN | NaN | NaN |
| 4567 | XNC | 2023 | 41.029104 | NaN | NaN | NaN |
| 4568 | ZAF | 2023 | 29571.219239 | South Africa | Africa | Sub-Saharan Africa |
| 4569 | ZMB | 2023 | 190416.586825 | Zambia | Africa | Sub-Saharan Africa |
| 4570 | ZWE | 2023 | 5690.371581 | Zimbabwe | Africa | Sub-Saharan Africa |
4571 rows × 6 columns
As observed, there are still several ISO codes that are not included in our region dataset. To address this, we need to identify these missing codes. If possible, we should supplement the dataset by filling in the appropriate country names and regional information from other reliable sources.
nan_data = forest_region[forest_region['continent_region'].isna()]
nan_data
| iso | year | umd_tree_cover_loss__ha | country | continent_region | continent_sub_region | |
|---|---|---|---|---|---|---|
| 200 | XAD | 2001 | 1.648989 | NaN | NaN | NaN |
| 201 | XCA | 2001 | 9.735251 | NaN | NaN | NaN |
| 202 | XKO | 2001 | 1122.205429 | NaN | NaN | NaN |
| 203 | XNC | 2001 | 17.661583 | NaN | NaN | NaN |
| 407 | XAD | 2002 | 0.507202 | NaN | NaN | NaN |
| ... | ... | ... | ... | ... | ... | ... |
| 4372 | XKO | 2022 | 784.685260 | NaN | NaN | NaN |
| 4373 | XNC | 2022 | 527.159030 | NaN | NaN | NaN |
| 4565 | XCA | 2023 | 0.989440 | NaN | NaN | NaN |
| 4566 | XKO | 2023 | 1465.438575 | NaN | NaN | NaN |
| 4567 | XNC | 2023 | 41.029104 | NaN | NaN | NaN |
87 rows × 6 columns
nan_data['iso'].unique()
array(['XAD', 'XCA', 'XKO', 'XNC'], dtype=object)
According to ISO 3166-1, codes starting with ‘X’ are reserved for user-assigned purposes and do not officially represent recognized countries. However, these codes are often used informally in datasets and applications to denote specific regions or entities. Here’s how the following codes are interpreted:
XAD: Commonly used to denote Andorra.
XKO: Typically used to represent Kosovo.
XNC: Frequently stands for New Caledonia.
The code XCA appears incomplete or undocumented. As a result, I have decided to drop it from the analysis.
update_values = {
'XAD': {'country': 'Andorra', 'continent_region': 'Europe', 'continent_sub_region': 'Southern Europe'},
'XKO': {'country': 'Kosovo', 'continent_region': 'Europe', 'continent_sub_region': 'Southern Europe'},
'XNC': {'country': 'New Caledonia', 'continent_region': 'Oceania', 'continent_sub_region': 'Melanesia'}
}
for iso, values in update_values.items():
forest_region.loc[forest_region['iso'] == iso,
['country', 'continent_region', 'continent_sub_region']] = values.values()
forest_region
| iso | year | umd_tree_cover_loss__ha | country | continent_region | continent_sub_region | |
|---|---|---|---|---|---|---|
| 0 | AFG | 2001 | 88.092712 | Afghanistan | Asia | Southern Asia |
| 1 | AGO | 2001 | 101220.621525 | Angola | Africa | Sub-Saharan Africa |
| 2 | AIA | 2001 | 3.878461 | Anguilla | Americas | Latin America and the Caribbean |
| 3 | ALA | 2001 | 396.934826 | Åland Islands | Europe | Northern Europe |
| 4 | ALB | 2001 | 3729.021031 | Albania | Europe | Southern Europe |
| ... | ... | ... | ... | ... | ... | ... |
| 4566 | XKO | 2023 | 1465.438575 | Kosovo | Europe | Southern Europe |
| 4567 | XNC | 2023 | 41.029104 | New Caledonia | Oceania | Melanesia |
| 4568 | ZAF | 2023 | 29571.219239 | South Africa | Africa | Sub-Saharan Africa |
| 4569 | ZMB | 2023 | 190416.586825 | Zambia | Africa | Sub-Saharan Africa |
| 4570 | ZWE | 2023 | 5690.371581 | Zimbabwe | Africa | Sub-Saharan Africa |
4571 rows × 6 columns
forest_region = forest_region[forest_region['iso'] != 'XCA']
forest_region
| iso | year | umd_tree_cover_loss__ha | country | continent_region | continent_sub_region | |
|---|---|---|---|---|---|---|
| 0 | AFG | 2001 | 88.092712 | Afghanistan | Asia | Southern Asia |
| 1 | AGO | 2001 | 101220.621525 | Angola | Africa | Sub-Saharan Africa |
| 2 | AIA | 2001 | 3.878461 | Anguilla | Americas | Latin America and the Caribbean |
| 3 | ALA | 2001 | 396.934826 | Åland Islands | Europe | Northern Europe |
| 4 | ALB | 2001 | 3729.021031 | Albania | Europe | Southern Europe |
| ... | ... | ... | ... | ... | ... | ... |
| 4566 | XKO | 2023 | 1465.438575 | Kosovo | Europe | Southern Europe |
| 4567 | XNC | 2023 | 41.029104 | New Caledonia | Oceania | Melanesia |
| 4568 | ZAF | 2023 | 29571.219239 | South Africa | Africa | Sub-Saharan Africa |
| 4569 | ZMB | 2023 | 190416.586825 | Zambia | Africa | Sub-Saharan Africa |
| 4570 | ZWE | 2023 | 5690.371581 | Zimbabwe | Africa | Sub-Saharan Africa |
4551 rows × 6 columns
forest_region[forest_region['continent_region'].isna()]
| iso | year | umd_tree_cover_loss__ha | country | continent_region | continent_sub_region |
|---|
After verifying the result, we can confirm that there are no null values in the country column anymore. This ensures that all entries now have valid and complete country information.
Step 5 - Emission Data#
We are now ready to load the emission dataset for further analysis.
emission = pd.read_csv("energy.csv")
emission
| Unnamed: 0 | Country | Energy_type | Year | Energy_consumption | Energy_production | GDP | Population | Energy_intensity_per_capita | Energy_intensity_by_GDP | CO2_emission | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | World | all_energy_types | 1980 | 292.899790 | 296.337228 | 27770.910281 | 4.298127e+06 | 68.145921 | 10.547000 | 4946.627130 |
| 1 | 1 | World | coal | 1980 | 78.656134 | 80.114194 | 27770.910281 | 4.298127e+06 | 68.145921 | 10.547000 | 1409.790188 |
| 2 | 2 | World | natural_gas | 1980 | 53.865223 | 54.761046 | 27770.910281 | 4.298127e+06 | 68.145921 | 10.547000 | 1081.593377 |
| 3 | 3 | World | petroleum_n_other_liquids | 1980 | 132.064019 | 133.111109 | 27770.910281 | 4.298127e+06 | 68.145921 | 10.547000 | 2455.243565 |
| 4 | 4 | World | nuclear | 1980 | 7.575700 | 7.575700 | 27770.910281 | 4.298127e+06 | 68.145921 | 10.547000 | 0.000000 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 55435 | 55435 | Zimbabwe | coal | 2019 | 0.045064 | 0.075963 | 37.620400 | 1.465420e+04 | 11.508701 | 4.482962 | 4.586869 |
| 55436 | 55436 | Zimbabwe | natural_gas | 2019 | 0.000000 | 0.000000 | 37.620400 | 1.465420e+04 | 11.508701 | 4.482962 | 0.000000 |
| 55437 | 55437 | Zimbabwe | petroleum_n_other_liquids | 2019 | 0.055498 | 0.000000 | 37.620400 | 1.465420e+04 | 11.508701 | 4.482962 | 4.377890 |
| 55438 | 55438 | Zimbabwe | nuclear | 2019 | NaN | NaN | 37.620400 | 1.465420e+04 | 11.508701 | 4.482962 | 0.000000 |
| 55439 | 55439 | Zimbabwe | renewables_n_other | 2019 | 0.068089 | 0.067499 | 37.620400 | 1.465420e+04 | 11.508701 | 4.482962 | 0.000000 |
55440 rows × 11 columns
emission.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 55440 entries, 0 to 55439
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Unnamed: 0 55440 non-null int64
1 Country 55440 non-null object
2 Energy_type 55440 non-null object
3 Year 55440 non-null int64
4 Energy_consumption 44287 non-null float64
5 Energy_production 44289 non-null float64
6 GDP 40026 non-null float64
7 Population 46014 non-null float64
8 Energy_intensity_per_capita 50358 non-null float64
9 Energy_intensity_by_GDP 50358 non-null float64
10 CO2_emission 51614 non-null float64
dtypes: float64(7), int64(2), object(2)
memory usage: 4.7+ MB
From the output of the info() function, we can see that the columns we intend to use—Country, CO2_emission, and Year—are stored in the appropriate dataset. To ensure consistency and avoid issues when merging the data, we will apply some minor formatting by renaming:
YeartoyearCountrytocountry
This step ensures uniform column naming across datasets.
emission.rename(columns={
'Year': 'year',
'Country': 'country'
}, inplace=True)
emission
| Unnamed: 0 | country | Energy_type | year | Energy_consumption | Energy_production | GDP | Population | Energy_intensity_per_capita | Energy_intensity_by_GDP | CO2_emission | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | World | all_energy_types | 1980 | 292.899790 | 296.337228 | 27770.910281 | 4.298127e+06 | 68.145921 | 10.547000 | 4946.627130 |
| 1 | 1 | World | coal | 1980 | 78.656134 | 80.114194 | 27770.910281 | 4.298127e+06 | 68.145921 | 10.547000 | 1409.790188 |
| 2 | 2 | World | natural_gas | 1980 | 53.865223 | 54.761046 | 27770.910281 | 4.298127e+06 | 68.145921 | 10.547000 | 1081.593377 |
| 3 | 3 | World | petroleum_n_other_liquids | 1980 | 132.064019 | 133.111109 | 27770.910281 | 4.298127e+06 | 68.145921 | 10.547000 | 2455.243565 |
| 4 | 4 | World | nuclear | 1980 | 7.575700 | 7.575700 | 27770.910281 | 4.298127e+06 | 68.145921 | 10.547000 | 0.000000 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 55435 | 55435 | Zimbabwe | coal | 2019 | 0.045064 | 0.075963 | 37.620400 | 1.465420e+04 | 11.508701 | 4.482962 | 4.586869 |
| 55436 | 55436 | Zimbabwe | natural_gas | 2019 | 0.000000 | 0.000000 | 37.620400 | 1.465420e+04 | 11.508701 | 4.482962 | 0.000000 |
| 55437 | 55437 | Zimbabwe | petroleum_n_other_liquids | 2019 | 0.055498 | 0.000000 | 37.620400 | 1.465420e+04 | 11.508701 | 4.482962 | 4.377890 |
| 55438 | 55438 | Zimbabwe | nuclear | 2019 | NaN | NaN | 37.620400 | 1.465420e+04 | 11.508701 | 4.482962 | 0.000000 |
| 55439 | 55439 | Zimbabwe | renewables_n_other | 2019 | 0.068089 | 0.067499 | 37.620400 | 1.465420e+04 | 11.508701 | 4.482962 | 0.000000 |
55440 rows × 11 columns
Step 6 - Final Data#
data = pd.merge(emission, forest_region, on =["country", "year"], how='inner')
data
| Unnamed: 0 | country | Energy_type | year | Energy_consumption | Energy_production | GDP | Population | Energy_intensity_per_capita | Energy_intensity_by_GDP | CO2_emission | iso | umd_tree_cover_loss__ha | continent_region | continent_sub_region | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 29112 | Afghanistan | all_energy_types | 2001 | 0.015914 | 0.007509 | 19.4201 | 21607.0 | 0.736543 | 0.819486 | 1.153149 | AFG | 88.092712 | Asia | Southern Asia |
| 1 | 29113 | Afghanistan | coal | 2001 | 0.000542 | 0.000515 | 19.4201 | 21607.0 | 0.736543 | 0.819486 | 0.001944 | AFG | 88.092712 | Asia | Southern Asia |
| 2 | 29114 | Afghanistan | natural_gas | 2001 | 0.001849 | 0.001849 | 19.4201 | 21607.0 | 0.736543 | 0.819486 | 0.451205 | AFG | 88.092712 | Asia | Southern Asia |
| 3 | 29115 | Afghanistan | petroleum_n_other_liquids | 2001 | 0.008037 | 0.000000 | 19.4201 | 21607.0 | 0.736543 | 0.819486 | 0.700000 | AFG | 88.092712 | Asia | Southern Asia |
| 4 | 29116 | Afghanistan | nuclear | 2001 | NaN | NaN | 19.4201 | 21607.0 | 0.736543 | 0.819486 | 0.000000 | AFG | 88.092712 | Asia | Southern Asia |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 19207 | 55435 | Zimbabwe | coal | 2019 | 0.045064 | 0.075963 | 37.6204 | 14654.2 | 11.508701 | 4.482962 | 4.586869 | ZWE | 11553.329511 | Africa | Sub-Saharan Africa |
| 19208 | 55436 | Zimbabwe | natural_gas | 2019 | 0.000000 | 0.000000 | 37.6204 | 14654.2 | 11.508701 | 4.482962 | 0.000000 | ZWE | 11553.329511 | Africa | Sub-Saharan Africa |
| 19209 | 55437 | Zimbabwe | petroleum_n_other_liquids | 2019 | 0.055498 | 0.000000 | 37.6204 | 14654.2 | 11.508701 | 4.482962 | 4.377890 | ZWE | 11553.329511 | Africa | Sub-Saharan Africa |
| 19210 | 55438 | Zimbabwe | nuclear | 2019 | NaN | NaN | 37.6204 | 14654.2 | 11.508701 | 4.482962 | 0.000000 | ZWE | 11553.329511 | Africa | Sub-Saharan Africa |
| 19211 | 55439 | Zimbabwe | renewables_n_other | 2019 | 0.068089 | 0.067499 | 37.6204 | 14654.2 | 11.508701 | 4.482962 | 0.000000 | ZWE | 11553.329511 | Africa | Sub-Saharan Africa |
19212 rows × 15 columns
We have successfully joined the deforestation and emission datasets using an inner join. This ensures that only the matching data between the two datasets is retained. The merged data is now ready for exploratory data analysis.
Step 7 - Exploratory Data Analysis#
Let’s begin by analyzing the trends in both deforestation and Food Security Index across different sub regions to get some nuances and find specific patterns.
forest_aggregate = data.groupby(['year', 'continent_region']).agg(
{'umd_tree_cover_loss__ha': 'sum', 'CO2_emission': 'sum'}
).reset_index()
fig = px.line(
forest_aggregate,
x="year",
y="umd_tree_cover_loss__ha",
color="continent_region",
labels={"year": "Year",
"umd_tree_cover_loss__ha": "Tree Cover Loss (ha)",
"continent_region": "Continent",
},
title="Deforestation by Region Over Time"
)
fig.show()
The chart highlights how deforestation varies across continents. The Americas show the highest levels of tree cover loss, with noticeable peaks in certain years that could reflect major deforestation events. Europe and Asia have moderate levels of deforestation, with some fluctuations over time. Africa shows a gradual but steady increase in tree cover loss, while Oceania remains relatively low throughout the period. These differences likely point to varying regional challenges and drivers behind deforestation.
fig = px.line(
forest_aggregate,
x="year",
y="CO2_emission",
color="continent_region",
labels={"year": "Year",
"CO2_emission": "CO2 Emission",
"continent_region": "Continent",
},
title="CO2 Emission by Region Over Time"
)
fig.show()
Insights from the CO2 Emission by Region Over Time Chart#
Asia:
Asia’s CO2 emissions have been on a steady rise, making it the biggest contributor among all regions. This isn’t surprising given the rapid industrialization and energy demands of many countries in the region. As economies grow, so does the reliance on energy-intensive industries, which is clearly reflected in the numbers.Americas:
In the Americas, CO2 emissions have stayed relatively stable, with only small fluctuations over time. This seems to show a balance—on one hand, industrial activities remain significant, but on the other, there’s been progress in adopting renewable energy and other mitigation measures to keep emissions in check.Europe:
Europe’s emissions have either held steady or started to decline slightly. This likely reflects the impact of strong environmental policies and the gradual shift toward cleaner energy sources like wind and solar. It’s an example of how focused efforts can make a difference over time.Africa:
Emissions in Africa are still low compared to other regions, but they are slowly creeping up. This aligns with the ongoing industrialization and urbanization across the continent. As economies grow, energy use increases, which contributes to this upward trend.Oceania:
Oceania continues to record the lowest emissions of all the regions, and the numbers haven’t changed much over time. This could be because the region has fewer energy-intensive industries and a smaller population compared to places like Asia or the Americas.
Looking at the charts, we can see some interesting patterns when comparing deforestation and CO2 emissions across regions. In the Americas, deforestation has stayed consistently high over the years, with noticeable peaks around 2010 and 2018. However, what stands out is that CO2 emissions in the Americas haven’t followed the same trend, they’ve remained fairly stable. This suggests that other factors, like industrial activities or transportation, might be playing a bigger role in driving emissions here compared to deforestation.
Asia tells a different story. CO2 emissions have been steadily climbing, and there’s a moderate but consistent trend of deforestation as well. This could mean that activities like industrial expansion or agriculture, which often lead to deforestation, are also directly contributing to the region’s rising emissions.
Europe shows yet another dynamic. There’s very little deforestation happening, but CO2 emissions remain high and steady. This suggests that land-use changes aren’t a major factor for emissions in Europe; instead, it’s likely industries and transportation that are driving their numbers.
In Africa, there’s a gradual increase in both deforestation and CO2 emissions. This makes sense given population growth and the expansion of agriculture, which often go hand in hand with deforestation. Lastly, Oceania stands out with minimal deforestation and a relatively small contribution to global CO2 emissions, reflecting its limited overall impact.
What these charts show is that the relationship between deforestation and CO2 emissions varies a lot by region. In places like Asia and Africa, there’s a stronger connection between the two, while in regions like Europe and the Americas, emissions seem to be driven by other factors. It’s a reminder of how different each region’s story is when it comes to environmental challenges.
fig = px.scatter(
data,
x="umd_tree_cover_loss__ha",
y="CO2_emission",
color="continent_region",
trendline="ols",
title="Correlation Between Deforestation and CO2 Emission",
labels={
"umd_tree_cover_loss__ha": "Tree Cover Loss (ha)",
"CO2_emission": "CO2 Emission",
"continent_region": "Continent"
}
)
fig.show()
The scatterplot tells an interesting story about how deforestation and CO2 emissions relate to one another across different regions. In Asia, we see the largest spread of deforestation values, paired with the highest CO2 emissions. This makes sense given the region’s rapid industrial growth and large-scale land-use changes, which are major contributors to emissions.
The Americas also show a noticeable pattern. While deforestation values vary, CO2 emissions are clustered in the mid to high range. This suggests that deforestation, along with other industrial activities, plays a significant role in driving emissions in this region.
Europe, on the other hand, has a weaker connection between deforestation and emissions. Even as deforestation varies, CO2 emissions stay relatively steady. This might reflect Europe’s success in adopting cleaner energy and stronger environmental policies that reduce reliance on activities that drive emissions.
In Africa, both deforestation and CO2 emissions are relatively low. This lines up with the region’s slower pace of industrialization and smaller-scale land-use changes. Finally, Oceania has the lowest levels of both deforestation and CO2 emissions, emphasizing its smaller global footprint.
What stands out from the plot is how the relationship between deforestation and emissions changes depending on the region. In Asia and the Americas, the link is clear, while in Europe, Africa, and Oceania, other factors seem to have a stronger influence. This shows that tackling deforestation’s impact on emissions requires a tailored approach for each region.
Step 8 - Correlation between Variables#
data['umd_tree_cover_loss__ha'].describe()
count 1.921200e+04
mean 1.133450e+05
std 4.514337e+05
min 0.000000e+00
25% 2.502050e+02
50% 5.715100e+03
75% 4.494546e+04
max 5.560431e+06
Name: umd_tree_cover_loss__ha, dtype: float64
data['CO2_emission'].describe()
count 19071.000000
mean 59.369662
std 414.280960
min -0.000138
25% 0.000000
50% 0.190589
75% 10.969902
max 10732.002367
Name: CO2_emission, dtype: float64
Looking at the numbers, both CO2 emissions and tree cover loss have distributions that are heavily skewed, with a few countries having extremely high values compared to the rest. For example, the maximum CO2 emission is over 10,000 metric tons, while the mean is only 59. This tells me that the average (mean) is being pulled up by these outliers, which doesn’t really represent what’s happening in most countries. The median, on the other hand, isn’t affected by these extremes, making it a much better choice for understanding the typical country’s emissions and deforestation. So, if I want to show a clearer picture of what’s happening at the country level, I’d definitely go with the median.
country_median = data.groupby('country').agg({
'umd_tree_cover_loss__ha':'median',
'CO2_emission':'median'
}).reset_index()
country_median
| country | umd_tree_cover_loss__ha | CO2_emission | |
|---|---|---|---|
| 0 | Afghanistan | 97.749480 | 0.188230 |
| 1 | Albania | 1188.248196 | 0.085429 |
| 2 | Algeria | 5763.909736 | 1.657544 |
| 3 | Angola | 161718.969611 | 0.226583 |
| 4 | Antigua and Barbuda | 25.463260 | 0.000000 |
| ... | ... | ... | ... |
| 173 | Vanuatu | 377.560462 | 0.000000 |
| 174 | Venezuela | 101285.582103 | 21.393005 |
| 175 | Vietnam | 132423.224113 | 17.462579 |
| 176 | Zambia | 84831.582310 | 0.067808 |
| 177 | Zimbabwe | 9631.632143 | 0.950000 |
178 rows × 3 columns
fig = px.scatter(
country_median,
x= 'umd_tree_cover_loss__ha',
y='CO2_emission',
text='country',
labels={
'umd_tree_cover_loss__ha': 'Median Tree Cover Loss (ha)',
'CO2_emission': 'Median CO2 Emission'},
title='Median Tree Cover Loss vs Median CO2 Emission by Country 2001 - 2019')
fig.update_traces(marker=dict(size=10, opacity=0.7), textposition='top center')
fig.update_layout(
title_font_size=16,
xaxis_title='Median Tree Cover Loss (ha)',
yaxis_title='Median CO2 Emission',
template='plotly_white'
)
fig.show()
This chart gives a fascinating look at how tree cover loss and CO2 emissions compare across countries from 2001 to 2019. Starting with the United States, it’s clear that while the U.S. has high CO2 emissions, its tree cover loss is relatively low compared to countries like Russia or Brazil. This suggests that most of the U.S.’s emissions come from industries like energy and transportation rather than deforestation.
China tells a similar story. It has some of the highest CO2 emissions but very little tree cover loss. This reflects the country’s reliance on heavy industry and fossil fuels to drive its rapid economic growth. On the other hand, Russia stands out with significant tree cover loss and moderate emissions. This could be due to logging and land-use changes contributing to its numbers.
Brazil and Indonesia, however, are different. Both countries show high levels of tree cover loss, but their CO2 emissions are not as high as those of industrial giants like the U.S. or China. For Brazil, the Amazon’s deforestation is a major factor, while in Indonesia, activities like palm oil plantations play a big role.
Then there’s the cluster of countries near the lower end of the chart, where both tree cover loss and emissions are minimal. These are likely smaller or less industrialized nations with limited impact on global emissions.
What this chart really shows is how different each country’s story is. For some, emissions are driven by industrialization, while for others, it’s deforestation and land-use changes that make the difference. It’s a reminder that tackling emissions requires tailored solutions that consider each country’s unique challenges.
Step 9 - Correlation in Each Region#
from scipy.stats import pearsonr
results = []
for region, group in forest_aggregate.groupby('continent_region'):
corr, p_value = pearsonr(group['umd_tree_cover_loss__ha'], group['CO2_emission'])
results.append({
'continent_region': region,
'correlation': corr,
'p_value': p_value,
'is_significant': 'Yes' if p_value < 0.05 else 'No'
})
correlations = pd.DataFrame(results)
correlations
| continent_region | correlation | p_value | is_significant | |
|---|---|---|---|---|
| 0 | Africa | 0.905043 | 1.015996e-07 | Yes |
| 1 | Americas | -0.121429 | 6.204574e-01 | No |
| 2 | Asia | 0.766577 | 1.290445e-04 | Yes |
| 3 | Europe | -0.578872 | 9.407491e-03 | Yes |
| 4 | Oceania | 0.390274 | 9.854547e-02 | No |
correlations['significance_color'] = correlations['is_significant'].map({'Yes': 'green', 'No': 'red'})
fig = px.bar(
correlations,
x='continent_region',
y='correlation',
title="Correlation Between Tree Cover Loss and CO2 Emissions by Region (Significance Highlighted)",
labels={
"continent_region": "Region",
"correlation": "Correlation Coefficient"
},
color='is_significant',
text='correlation',
color_discrete_map={'Yes': 'green', 'No': 'red'}
)
fig.update_traces(
texttemplate='%{text:.2f} (%{customdata[1]})',
textposition='outside',
customdata=correlations[['continent_region', 'is_significant']]
)
fig.update_layout(
showlegend=True,
yaxis_title="Correlation Coefficient",
legend_title="Significance",
)
fig.show()
The numbers tell an interesting story about how deforestation and CO2 emissions are connected in different regions. In Africa, the correlation is strong at 0.91, and it’s statistically significant, showing that deforestation has a clear and direct impact on emissions. Similarly, Asia has a strong and significant correlation of 0.77, reflecting how activities like land-use changes and industrial expansion drive emissions. Europe is quite different, with a negative correlation of -0.58, which is also significant. This likely shows the impact of successful policies like reforestation and the shift to cleaner energy. In the Americas, the correlation is weak at -0.12 and not significant, suggesting that emissions here are influenced more by industry and transportation rather than deforestation. Oceania also has a weak correlation of 0.39, and it’s not significant either, highlighting the region’s smaller role in these dynamics. These numbers really show how each region has its own unique relationship between deforestation and emissions.
from scipy.stats import linregress
regions = forest_aggregate.groupby('continent_region')
simple_regression_results = []
for region, group in regions:
slope, intercept, r_value, p_value, std_err = linregress(
group['umd_tree_cover_loss__ha'], group['CO2_emission']
)
simple_regression_results.append({
'region': region,
'slope': slope,
'intercept': intercept,
'r_squared': r_value**2,
'p_value': p_value,
'significant': p_value < 0.05
})
simple_regression_df = pd.DataFrame(simple_regression_results)
print(simple_regression_df)
region slope intercept r_squared p_value significant
0 Africa 0.000065 1540.419573 0.819104 1.015996e-07 True
1 Americas -0.000005 15763.538362 0.014745 6.204574e-01 False
2 Asia 0.001006 9540.867791 0.587640 1.290445e-04 True
3 Europe -0.000037 13189.972045 0.335093 9.407491e-03 True
4 Oceania 0.000009 854.346422 0.152314 9.854547e-02 False
fig = px.bar(
simple_regression_df,
x='region',
y='r_squared',
color='significant', # Highlight statistically significant regions
title="R-squared and Statistical Significance by Region",
labels={'region': 'Region', 'r_squared': 'R-squared Value'},
text='r_squared'
)
fig.update_traces(texttemplate='%{text:.2f}', textposition='outside')
fig.show()
This chart paints a clear picture of how deforestation and CO2 emissions are connected across regions.
Africa stands out with the highest R-squared value of 0.82, meaning deforestation explains a lot of the variation in emissions, and this relationship is statistically significant.
In Asia, the R-squared is 0.59, which is still significant and shows a strong link between deforestation and emissions, likely tied to land-use changes and industrial growth.
Europe is interesting because its R-squared is lower, at 0.34, but the connection is still significant, reflecting efforts like reforestation and cleaner energy.
For the Americas, the story is different, its R-squared is barely 0.01, and the relationship isn’t significant, showing that emissions here are driven by other sectors like transportation and industry.
Oceania has an R-squared of 0.14, which is also not significant, pointing to a weaker connection.
This really shows how the role of deforestation in emissions varies widely depending on the region and its unique dynamics.
Final Summary#
When I set out to explore how deforestation correlates with CO2 emissions globally, I wanted to understand whether cutting down forests, often for agriculture or industrial purposes, is truly contributing to climate change. Looking at the data, the connection is clear in some regions but less so in others, showing that the relationship between deforestation and emissions is more complex than I initially thought.
In regions like Africa and Asia, the story is straightforward. Africa has the strongest link, with an R-squared of 0.82, meaning deforestation directly explains most of the CO2 emissions in the region. This makes sense, as land-use changes like agriculture are major contributors to emissions. Asia follows closely with an R-squared of 0.59, reflecting how industrial expansion and deforestation go hand in hand, fueling rising emissions. Both of these regions show statistically significant relationships, reinforcing the idea that deforestation is a key driver of CO2 emissions here.
In Europe, the connection is weaker, with an R-squared of 0.34, though still statistically significant. This suggests that while deforestation plays a role, policies like reforestation and sustainable land management have helped offset its impact on emissions. In contrast, the Americas and Oceania tell a very different story. The correlation is minimal, with R-squared values of 0.01 and 0.14, respectively, and neither is statistically significant. This points to other factors, like industrial and transportation emissions, playing a bigger role in these regions.
Deforestation is undoubtedly a driver of climate change, but its impact varies depending on the region. In places like Africa and Asia, where deforestation is closely tied to emissions, reducing forest loss could make a big difference in combating climate change. On the other hand, regions like Europe show how reforestation and strong environmental policies can weaken the link between deforestation and emissions. This journey from my initial question has shown me that tackling deforestation’s role in climate change requires region-specific strategies, balancing global solutions with local realities.
Collaboration and Sources#
This project was completed independently.
I used ChatGPT to review my code, verify the logic, enhance data visualizations, and ensure the insights were clear and concise.